Reinforcement Learning Example

This is a simple example of Reinforcement Learning using Python and the OpenAI Gym library.

Reinforcement Learning Overview

Reinforcement Learning is a type of machine learning where an agent learns to make decisions by interacting with an environment. The agent takes actions in the environment, receives feedback in the form of rewards or penalties, and uses this feedback to improve its decision-making over time. The goal is for the agent to learn a policy that maximizes the cumulative reward over time.

Key concepts of Reinforcement Learning:

Agent: The entity making decisions in the environment.
Environment: The external system with which the agent interacts.
State: The current situation or configuration of the environment.
Action: The decision or move made by the agent at a given state.
Reward: The feedback signal received by the agent after taking an action at a certain state.
Policy: The strategy or mapping from states to actions that the agent learns.

Reinforcement Learning is applied in various domains, including robotics, game playing, and autonomous systems.

Python Source Code:

# Import necessary libraries
import gym
import numpy as np

# Create the FrozenLake environment
env = gym.make('FrozenLake-v1')

# Initialize the Q-table with zeros
Q = np.zeros([env.observation_space.n, env.action_space.n])

# Set hyperparameters
learning_rate = 0.8
discount_factor = 0.95
num_episodes = 2000

# Implement the Q-learning algorithm
for episode in range(num_episodes):
    state = env.reset()
    total_reward = 0
    done = False
    
    while not done:
        # Choose an action using epsilon-greedy strategy
        if np.random.rand() < 0.5:
            action = env.action_space.sample()  # Exploration
        else:
            action = np.argmax(Q[state, :])  # Exploitation

        # Take the chosen action and observe the next state and reward
        next_state, reward, done, _ = env.step(action)

        # Update the Q-value using the Q-learning formula
        Q[state, action] = (1 - learning_rate) * Q[state, action] + \
                           learning_rate * (reward + discount_factor * np.max(Q[next_state, :]))

        # Move to the next state
        state = next_state

        # Accumulate the total reward for this episode
        total_reward += reward

    # Print the total reward for each episode
    if (episode + 1) % 100 == 0:
        print(f"Episode {episode + 1}/{num_episodes}, Total Reward: {total_reward}")

# Evaluate the learned policy
total_reward = 0
num_eval_episodes = 10

for _ in range(num_eval_episodes):
    state = env.reset()
    done = False

    while not done:
        action = np.argmax(Q[state, :])
        next_state, reward, done, _ = env.step(action)
        total_reward += reward
        state = next_state

# Print the average reward over evaluation episodes
average_reward = total_reward / num_eval_episodes
print(f"Average Reward over {num_eval_episodes} Evaluation Episodes: {average_reward}")

Explanation:

Import Libraries: Import necessary Python libraries, including OpenAI Gym for RL environments.
Create Environment: Create the FrozenLake environment from OpenAI Gym.
Initialize Q-Table: Initialize the Q-table with zeros, where each entry represents the expected cumulative reward for taking a specific action in a specific state.
Set Hyperparameters: Define hyperparameters such as learning rate, discount factor, and the number of episodes.
Implement Q-Learning: Implement the Q-learning algorithm, where the agent explores and exploits the environment to learn an optimal policy.
Evaluate Learned Policy: Evaluate the learned policy by running the agent in the environment and calculating the average reward over evaluation episodes.
Print Results: Print the total reward for each training episode and the average reward over evaluation episodes.